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Abstract 

For the Gaussian sequence model, we obtain non-asymp- totic minimax rates of es¬ 
timation of the linear, quadratic and the t? 2 -norm functionals on classes of sparse vectors 
and construct optimal estimators that attain these rates. The main object of interest is 
the class Bq(s) of s-sparse vectors 9 = (8\,... , 6b), for which we also provide completely 
adaptive estimators (independent of s and of the noise variance a) having only logarithmi¬ 
cally slower rates than the minimax ones. Furthermore, we obtain the minimax rates on 

the q-balls B q (r) = {9 G K d : ||0|| g < r} where 0 < q < 2, and ||0|| g = 

This analysis shows that there are, in general, three zones in the rates of convergence 
that we call the sparse zone, the dense zone and the degenerate zone, while a fourth zone 
appears for estimation of the quadratic functional. We show that, as opposed to estima¬ 
tion of 6 , the correct logarithmic terms in the optimal rates for the sparse zone scale as 
log(d/s 2 ) and not as log(d/s). For the class Bq(s), the rates of estimation of the linear 
functional and of the f^-norm have a simple elbow at s = \fd (boundary between the 
sparse and the dense zones) and exhibit similar performances, whereas the estimation of 
the quadratic functional Q(9) reveals more complex effects and is not possible only on the 
basis of sparsity described by the condition 9 G f?o(s). Finally, we apply our results on 
estimation of the ^ 2 -norm to the problem of testing against sparse alternatives. In par¬ 
ticular, we obtain a non-asymptotic analog of Ingster-Donoho-Jin theory revealing some 
effects that were not captured by the previous asymptotic analysis. 

Keywords: nonasymptotic minimax estimation; linear functional; quadratic functional; 
sparsity; unknown noise variance; thresholding 


1 Introduction 

In this paper, we consider the model 

(1) y 3 = 6j + o-q, j = l,...,d, 

where 9 = (9 1 ,..., 9j) G is an unknown vector of parameters, are i.i.d. standard normal 
random variables, and a > 0 is the noise level. We study the problem of estimation of linear 
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and quadratic functionals 


d d 

m = and Q(o) = Y J ol 

i= 1 i= 1 

and of the f^-norm 

ll<?l|2 = VW) 

based on the observations yi,... ,yd- 

In this paper, we assume that 9 belongs to a given subset 0 of W l . We will be considering 
classes 0 with elements satisfying the sparsity constraints ||#||o < s where ||0||o denotes the 
number of non-zero components of 0, or \\0\\ q < r where 

n«n,=( ei 9 -i‘ ! 

\i =1 

Here, r, q > 0 and the integer [1, d\ are given constants. 

Let T(6) be one of the functionals L(9), Q(6) or sjQ(6). As a measure of quality of an 
estimator T of the functional T(9), we consider the maximum squared risk 

supE e (T-T(0))\ 
eee 

where Eg denotes the expectation with respect to the probability measure of the vector 
of observations (yi, ■ ■ ■ ,Vd) satisfying (1). The best possible quality is characterized by the 
minimax risk 

Rt{@) = inf sup Eg(T - T(0)) 2 , 
f flee 

where inf j, denotes the infimum over all estimators. In this paper, we find minimax optimal 
estimators of T(9), i.e., estimators T such that 

(2) su P E e (T-T(0)) 2 xi^(0). 

flee 

Here and below, we write ax6 if c < a/b < C for some absolute positive constants c and 
C. Note that the minimax optimality is considered here in the non-asymptotic sense, i.e., (2) 
should hold for all d and <7. 

The literature on minimax estimation of linear and quadratic functionals is rather exten¬ 
sive. The analysis of estimators of linear functionals from the minimax point of view was 
initiated in [20] while for the quadratic functionals we refer to [15]. These papers, as well as 
the subsequent publications [10, 11, 14, 16, 18, 19, 25, 26, 28, 29, 30, 31, 32, 33, 34], focus on 
minimax estimation of functionals on the classes 0 describing the smoothness properties of 
functions in terms of their Fourier or wavelet coefficients. Typical examples are Sobolev ellip¬ 
soids, hyperrectangles or Besov bodies while a typical example of linear functional is the value 
of a smooth function at a point. In this framework, a deep analysis of estimation of function¬ 
als is now available including the minimax rates (and in some cases the minimax constants), 
oracle inequalities and adaptation. Extensions to linear inverse problems have been consid¬ 
ered in detail by [7, 8, 17]. Note that classes 0 studied in this literature are convex classes. 
Estimation of functionals on the non-convex sparsity classes Bq(s) = {9 € : ||#||o < s} 



2 



or B q {r ) = {9 £ M d : ||0|| g < r} with 0 < q < 1 has received much less attention. We are 
only aware of the paper [9], which establishes upper and lower bounds on the minimax risk 
for estimators of the linear functional L{9) on the class Bq(s). However, that paper considers 
the special case when s < d a for some a < 1/2, and a = 1 /\fd and there is a logarithmic gap 
between the upper and lower bounds. Minimax rates for the estimation of Q(9) and of the 
^ 2 -norm on the classes Bq(s) and B q (r), 0 < q < 2, were not studied. Note, that estimation 
the fVnorm is closely related to minimax optimal testing of hypotheses under the £2 sepa¬ 
ration distance in the spirit of [23]. Indeed, the optimal tests for this problem are based on 
estimators of the l^-norni. A non-asymptotic study of minimax rates of testing for the classes 
B 0 (a) aR d B q (r), 0 < q < 2, is given in [4] and [38]. But for the testing problem, the risk 
function is different and these papers do not provide results on the estimation of the £ 2 - 110011 . 
Note also that the upper bounds on the minimax rates of testing in [4] and [38] depart from 
the lower bounds by a logarithmic factor. 

I 11 this paper, we find non-asymptotic minimax rates of estimation of the above three 
functionals on the sparsity classes Bq(s), B q (r) and construct optimal estimators that attain 
these rates. We deal with non-convex classes B q (0 < q < 1) for the linear functional and 
with the classes that are not quadratically convex (0 < q < 2) for Q(8) and of the £ 2 - 110011 . 
Our main object of interest is the class Bq(s), for which we also provide completely adaptive 
estimators (independent of a and s ) having only logarithmically slower rates than the minimax 
ones. Some interesting effects should be noted. First, we show that, for the linear functional 
and the £ 2-110011 there are, in general, three zones in the rates of convergence that we call the 
sparse zone, the dense zone and the degenerate zone, while for the quadratic functional an 
additional fourth zone appears. Next, as opposed to estimation of the vector 9 in the £ 2 - 110011 , 
cf. [13, 5, 1, 27, 35, 38], the correct logarithmic terms in the optimal rates for the sparse zone 
scale as log(d/s 2 ) and not as log (d/s). Noteworthy, for the class Bq(s ), the rates of estimation 
of the linear functional and of the £2-1101'™ have a simple elbow at s = \fd (boundary between 
the sparse and the dense zones) and exhibit similar performances, whereas the estimation of 
the quadratic functional Q{9) reveals more complex effects and is not possible only on the basis 
of sparsity described by the condition 9 £ Bq(s). Finally, we apply our results on estimation 
of the ^ 2 -iiorin to the problem of testing against sparse alternatives. In particular, we obtain 
a non-asymptotic analog of Ingster-Donoho-Jin theory revealing some effects that were not 
captured by the previous asymptotic analysis. 


2 Minimax estimation of the linear functional 


In this section, we study the minimax rates of estimation of the linear functional L(9 ) and we 
construct minimax optimal estimators. 

Assume first that 0 is the class of s-sparse vectors Bq(s) = {9 £ : ||0||o < s} where s 

is a given integer, 1 < s < d. Consider the estimator 


L = 


Yhj-lVj ^{\yj\><jy/2 log(l+d/s 2 )} 

zU yj 


if s < Vd, 
if s > \fd, 


where 1/ 1 denotes the indicator function. 
The following theorem shows that 


V’ct (s, d) = a 2 s 2 log(l + d/s 2 ) 
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is the minimax rate of estimation of the linear functional on the class Bq(s) and that L is a 
minimax optimal estimator. 

Theorem 1. There exist absolute constants c > 0, C > 0 such that, for any integers s,d 
satisfying 1 < s < d, and any a > 0, 

(3) sup E 0 (L - L(6)) 2 < Cif£(s,d), 

OgBq (s) 


and 

(4) R* L (B 0 (s))>c^(s,d). 

Proofs of (3) and of (4) are given in Sections 8 and 7 respectively. Note that since 
log(l + u) > u/2 for 0 < u < 1, and log(l 4ii)<«we have 

(5) cr 2 s 2 log(l + d/s 2 ) x min(<r 2 s 2 log(l + d/s 2 ),a 2 d) 
for all 1 < s < d. Thus, 

(6) R* l (Bq(s )) x min(er 2 s 2 log(l + d/s 2 ), a 2 d). 


We consider now the classes B q (r) = {0 € : ||0|| g < r}, where 0 < q < 1, and r is a 

positive number. For any r,a,q> 0 any integer d > 1, we define the integer 

(7) m = max{s > 1 : a 2 log(l + d/s 2 ) < r 2 s~ 2 / q , s € N} 

if the set {s > 1 : a 2 log(l + d/s 2 ) < r 2 s~ 2 ^ q , s € N} is non-empty, and we put m = 0 if this 
set is empty. The next two theorems show that the optimal rate of convergence of estimators 
of the linear functional on the class B q (r) is of the form: 



a 2 m 2 log(l + d/m 2 ) 


if m > 1, 
if m = 0. 


The following theorem shows that if^ q {r,d) is a lower bound on the convergence rate of the 
minimax risk of the linear functional on the class B q (r). 

Theorem 2. If 0 < q < 1, then there exists a constant c > 0 such that, for any integer d > 1 
and any r, a > 0, we have 

(8) RUB q (r))>c^ q (r,d). 


The proof of Theorem 2 is given in Section 7. 

We now turn to the construction of minimax optimal estimators on B q {r). For 0 < q < 1, 
define the following statistic 


4 


sr^d 

12j=i Vo 

' ^dj=\Vi 4|2/j| >2a^/2 log(l+cf/m 2 )} 

0 


if m > \fd, 
if 1 < m < \fd, 
if m = 0. 
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Theorem 3. Let 0 < q < 1. There exists a constant C > 0 such that, for any integer d > 1 
and any r,c r > 0, we have 

(9) sup E e {L q - L(0)) 2 < (r,d). 

d&B q (r) 

The proof of Theorem 3 is given in Section 8. Theorems 2 and 3 imply that if^ q (r,d) is 
the minimax rate of estimation of the linear functional on the ball B q (r) and that L q is a 
minimax optimal estimator. 

Some remarks are in order here. Apart from the degenerate case m = 0 when the zero 
estimator is optimal, we obtain on B q {r ) the same expression for the optimal rate as on the 
class -Bo(s), with the difference that the sparsity s is now replaced by the “effective sparsity 11 m. 
Heuristically, m is obtained as a solution of 

( 7 2 m 2 log(l + d/m 2 ) x r 2 m 2 ~ 2 / <? 


where the left hand side represents the estimation error for m-sparse signals established in 
Theorem 1 and the right hand side gives the error of approximating a vector from B q (r ) by 
an m-sparse vector in squared fa-norm. Note also that, in view of (5), we can equivalently 
write the optimal rate in the form 


d) 


a 2 d if 

< cr 2 m 2 log(l + d/m 2 ) if 
r 2 if 


m > Vd, 

1 < m < Vd, 
m = 0. 


Thus, the optimal rate on Bq^r) has in fact three regimes that we will call the dense zone 
(m > V~d), the sparse zone (1 < m < Vd), and the degenerate zone (m = 0). Furthermore, 
it follows from the definition of m that the rate ipjf (r, d) in the sparse zone is of the order 
a 2 (r/a) 2q log 1 ~' ? (l + d(a/r) 2q ), which leads to 

{ (r 2 d if m > yfd, 

(j 2 (r/cr) 2q log 1 ^' ? (l + d(a/r) 2g ) if 1 < m < \fd, 
r 2 if m = 0. 


In particular, for q = 1, the logarithmic factor disappears from the rate, and the optimal rates 
in the sparse and degenerate zones are both equal to r 2 . Therefore, for q = 1, there is no 
need to introduce thresholding in the definition of L q , and it is enough to use only the zero 
estimator for m < \/d and the estimator Xq=i Dj f° r m > to achieve the optimal rate. 


3 Minimax estimation of the quadratic functional 

Consider now the problem of estimation of the quadratic functional Q(0) = Yli=i ®i- F° r any 
integers s, d satisfying 1 < s < d, and any a > 0, we introduce the notation 

j. r , n _ f fr 4 s 2 log 2 (l + d/s 2 ) if s<Vd, 

a 4 d if s > Vd. 

The following theorem shows that 

ip® (s, d, k) = min{«; 4 , max{cr 2 K 2 , d)}} 

is a lower bound on the convergence rate of the minimax risk of the quadratic functional on 
the class B2(k) D Bq(s), where = {6 £ : ||0||2 < k}. 
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Theorem 4. There exists an absolute constant c > 0 such that, for any integers s, d satisfying 
1 < s < d, and any k, cr > 0, we have 


( 10 ) 


r q( b 2 (k) n Bq(s)) > cij)Q(s,d, k). 


The proof of Theorem 4 is given in Section 7. 

One of the consequences of Theorem 4 is that Rq(Bo(s)) = oo (set k = oo in (10)). Thus, 
only smaller classes than Bq(s) are of interest when estimating the quadratic functional. The 
class B- 2 (k) O Bq(s) naturally arises in this context but other classes can be considered as well. 
We now turn to the construction of minimax optimal estimator on B-2(k) fl Bq(s). Set 


a s = E(X' 2 \X 2 > 2log(l + d/s 2 )) 


E ( X2l {|.Y|>V2iog(i+d/^)}) 
P(\X\ > V21og(l + d/«2))’ 


where X ~ jV(0,1) denotes the standard normal random variable. Introduce the notation 


if a (s,d,K) = max{a 2 K 2 , iper(s, d)}. 


Thus, 

(11) ipQ(s,d,n) = min{K 4 ,V’ ( 7 (s,d,n)}. 


Define the following statistic 


Q 


' E?=i (yj ~ «^ 2 ) 1 { | W |> CT V 2iog(i+^) } if s ^ and k4 ^ M s - 
< Ylj =i Vj ~ da 2 if s > Vd and k 4 > d, k), 

,0 if K 4 < d, k). 


Theorem 5. There exists an absolute constant C > 0 such that, for any integers s, d satisfying 
1 < s < d, and any k, u > 0, we have 

(12) sup E e {Q - Q{6)) 2 < Ci${s,d,K). 

eeB 2 (K)rB 0 (s) 


The proof of Theorem 5 is given in Section 8. Theorems 4 and 5 imply that (s,d, k) 
is the minimax rate of estimation of the quadratic functional on the class B 2 (k) fl Bq(s) and 
that Q is a minimax optimal estimator. 

As a corollary, we obtain the minimax rate of convergence on the class B 2 (k) (set s = d 
in Theorems 4 and 5). In this case, the estimator Q takes the form 


( J2j=l Vj ~ da 2 if k 4 > max{<r 2 K 2 , a 4 d}, 
\ 0 if k 4 < max{c7 2 K 2 , a 4 d}. 


Corollary 1. There exist absolute constants c,C > 0 such that, for any n,a > 0, we have 

(13) sup E e{Q* ~ Q(6)) 2 < Cmin{K 4 , max(cj 2 K 2 , cr 4 d)}, 

0eB2(k) 


Rq(B 2 (k)) > cmin{K 4 , max(cr 2 K 2 , a 4 d)}. 


and 

(14) 
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Note that the upper bounds of Theorem 5 and Corollary 1 obviously remain valid for the 
positive part estimators Q + = max{Q,0}, and = max{Q*,0}. The upper rate as in 

(13) on the class ^(k) with an extra logarithmic factor is obtained for different estimators in 
[25, 26]. 

Alternatively, we consider the classes B q (r), where r is a positive number and 0 < q < 2. 
As opposed to the case of Bq(s), we do not need to consider intersection with ^(k). Indeed, it 
is granted that the f^-iiorm of 9 is uniformly bounded thanks to the inclusion B q (r) C _B 2 (r). 


For any r, a > 0, 0 < q < 2 

, and any integer d > 1 we set 



( 

max{a 2 r 2 , <r 4 d} 

if 

m > \fd , 

A- 

3 0 

s, 

II 

max{a 2 r 2 , <r 4 m 2 log 2 (l + d/m 2 )} 

if 

1 < m < \fd, 

l 

^4 

if 

m = 0, 


where m is the integer defined above (cf. (7)) and depending only on d,r,a,q. The following 
theorem shows that ifc,q (r, d) is a lower bound on the convergence rate of the minimax risk of 
the quadratic functional on the class B q {r). 

Theorem 6. Let 0 < q < 2. There exists a constant c > 0 such that, for any integer d > 1, 
and any r, a > 0, we have 


(15) 


R* Q (B q (r))>c^ q (r,d). 


We now turn to the construction of minimax optimal estimators on B q {r). Consider the 
following statistic 


Q q 


ZUvj 
< zU(yj 
0 


da 2 

** m<T ) ^{\Vj | >2<t y/2 log(l +d/m 2 )} 


if m > \fd, 
if 1 < m < \fd, 
if m = 0, 


where d m = E(A 2 | X 2 > 81og(l + d/m 2 )), X ~ jV(0,1). 


Theorem 7. Let 0 < q < 2. There exists a constant C > 0 such that, for any integer d > 1 , 
and any r, a > 0, we have 


(16) 


sup E e (Q q - Q{6)) 2 < Cip% q (r, d). 
9eB q (r) 


The proof of Theorem 7 is given in Section 8. Theorems 6 and 7 imply that ip® q (r,d) is 
the minimax rate of estimation of the quadratic functional on the class B q (r) and that Q q is 
a minimax optimal estimator. 

Notice that, in view of the definition of m, in the sparse zone we have 


cr 4 m 2 log 2 (l + d/m 2 ) x cr 4 (r/cr) 29 log 2 9 (1 + d(a/r) 2q ), 


which leads to 

V&( r > d ) 


max{< t 2 ?’ 2 , a 4 d} 

max{(j 2 r 2 , cr 4 (r /a) 2q log 2 ^ 9 (l + d(cr/r) 2q )} 


if m > \fd, 
if 1 < m < \fd, 
if m = 0. 


One can check that for q = 2 this rate is of the same order as the rate obtained in Corollary 1. 
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4 Minimax estimation of the t^-norm 


Interestingly, the minimax rates of estimation of the t^-norm ||0||2 = \JQ{9) do not depend 
on the radius k, as opposed to the rates for Q(9) established above. It turns out that the 
restriction to B 2 (k) is not needed to get meaningful results for estimation of \JQ(9) on the 
sparsity classes. We drop this restriction and assume that 0 = Bq(s). Consider the estimator 


N = V max{Q., 0} 


where 


Q, = 


Ylj-liVj a s a ) ^{|j ij\>ay/2 log(l+d/s 2 )} 


if s < Vd, 
if s > Vd. 


% - a s a 2 ' 

The following theorem shows that N is a minimax optimal estimator of the f^-norm ||#||2 = 
VQ(0) on the class Bq(s) and that the corresponding minimax rate of convergence is 

j,V<3( q m _ / ^ 2 ^log(l + d/s 2 ) if s < Vd, 

K (S ’ d) ~\ a 2 Vd if s>Vd. 

Theorem 8. There exist absolute constants c > 0, C > 0 such that, for any integers s, d 
satisfying 1 < s < d, and any a > 0, 


(17) 

and 


sup ■E e (N-\\9\\ 2 ) 2 <C'ff3(s 1 d), 

0eB o (s) 


(18) R*^{BVs))>c^{s,d). 

Proofs of (17) and of (18) are given in Sections 8 and 7 respectively. 

Our next step is to analyze the classes B q (r). For any r, a > 0, 0 < q < 2, and any integer 
d > 1 we set 

{ a 2 Vd if m > Vd, 

cr 2 mlog(l + d/m 2 ) if 1 <m< Vd, 
r 2 if m = 0, 

where m is the integer defined above (cf. (7)) and depending only on d,r,a,q. The estimator 
that we consider when 9 belongs to the class B q (r) is 


N q = y // max{Q (? ,0}. 

Theorem 9. Let 0 < q < 2. There exist constants C, c > 0 such that, for any integer d > 1, 
and any r,cr > 0, we have 

(19) sup E<o(N q - \\9\\ 2 ) 2 < C^(r, d), 

9&B q (r) 


and 

( 20 ) 


R VQ^ B V r )) > c^{r,d). 



Proofs of (19) and of (20) are given in Sections 8 and 7 respectively. 

As in the case of linear and quadratic functionals, we have an equivalent expression for 
the optimal rate: 


{ a 2 \fd if m > \fd, 

a 2 {r/a) q \og l ~ q ^ 2 {\ + d(a/r) 2q ) if 1 < m < \fd, 
r 2 if m = 0. 

Though we formally did not consider the case q = 2, note that the logarithmic factor disappears 
from the above expression when q = 2, and the optimal rates in the sparse and degenerate zones 
are both equal to r 2 . This suggests that, for q = 2, there is no need to introduce thresholding 
in the definition of N q , and it is enough to use only the zero estimator for m < \fd and the 
estimator (max { y 2 — da 2 , 0}) ^ for m > \fd to achieve the optimal rate. 

5 Estimation with unknown noise level 

In this section, we discuss modifications of the above estimators when the noise level a is 
unknown. A general idea leading to our construction is that the smallest y 2 are likely to 
correspond to zero components of 6 , and thus to contain information on a not corrupted 
by 6. Here, we will demonstrate this idea only for estimation of s-sparse vectors in the case 
s < \fd. Then, not more than d — \fd smallest y 2 can be used for estimation of the variance. 
Throughout this section, we assume that d > 3. 

We start by considering estimation of the linear functional. Then it is enough to replace 
a in the definition of L by the following statistic 

a=3 (i £ »&>r 

j<d—y/d 

where y 2 ^ < ■ ■ ■ < y 2 ^ are the order statistics associated to y \,... ,y\. Note that a is not a 
good estimator of a but rather an over-estimator. The resulting estimator of L{6) is 

d 

L ^{|y.;l><5V 2 1 °g( 1 + rf / s2 )} ' 

3 =1 

Theorem 10. There exists an absolute constant C such that, for any integers s and d satis¬ 
fying s < \fd, and any a > 0, 


sup E e (L - L(6)) 2 <Cife{s,d). 
9(zBq(s) 


The proof of Theorem 10 is given in Section 8. 

Note that the estimator L depends on s. To turn it into a completely data-driven one, we 
may consider 

d 

L ^{|%l>o-\/21ogd}- 

3 = 1 
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Inspection of the proof of Theorem 10 leads to the conclusion that 
(21) sup Efl(Z/ — L(6)) 2 < Ca 2 s 2 \ogd. 

d£B 0 (s) 

Thus, the rate for the data-driven estimator L' is not optimal but the deterioration is only in 
the expression under the logarithm. 

A data-driven estimator of the quadratic functional can be taken in the form: 

d 

Q — ^-{\yj\>»y/2 log d} • 

1=1 

The following theorem shows that the estimator Q is nearly minimax on ^(k) flSo(s) for 

s < Vd. 

Theorem 11. There exists an absolute constant C such that, for any integers s and d satis¬ 
fying s < \/d, and any a > 0, 

sup E g(Q — Q{6)) 2 < C max (<j 2 k 2 , <j 4 s 2 log 2 d\. 
eeB 2 (K)nB 0 (s) L J 

The proof of Theorem 11 is given in Section 8. 


6 Consequences for the problem of testing 

The results on estimation of the t^-norm stated above allow us to obtain the solution of 
the problem of non-asymptotic minimax testing on the classes Bq(s) and B q (r) under the £2 
separation distance. For q > 0, u > 0, and <5 > 0, consider the set 

©,,«(<5) = {0€ B q {u) : ||0|| 2 > 5}. 

Assume that we wish to test the hypothesis Ho : 6 = 0 against the alternative 

Hi: 0G @q,u(S). 

Let A be a test statistic with values in {0,1}. We define the risk of test A as the sum of the 
first type error and the maximum second type error: 

P 0 (A = 1) + sup P 0 (A = 0). 
oee> q ,u(S) 

A benchmark value is the minimax risk of testing 

Kq,u(6) = inf {p 0 (A = 1) + sup P 0 (A = 0)) 

A 1 eee q , u (8) J 

where infA is the infimum over all {0, l}-valued statistics. The minimax rate of testing on 
Q q ,u is defined as A > 0, for which the following two facts hold: 

(i) for any £ € (0,1) there exists A e > 0 such that, for all A > A e , 

(22) K q ,u(AA) < e, 
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(ii) for any e G (0,1) there exists a e > 0 such that, for all 0 < A < a £ , 


(23) 


B,q iU (AX) > 1 — £. 


Note that this defines a non-asymptotic minimax rate of testing as opposed to the classical 
asymptotic definition that can be found, for example, in [23]. A non-asymptotic minimax 
study of testing for the classes Bq{s) and B q (r) is given by [4] and [38]. However, those papers 
derive the minimax rates of testing on Qq :U only up to a logarithmic factor. The next theorem 
provides the exact expression for the minimax rates in the considered testing setup. 

Theorem 12. For any integers s and d satisfying 1 < s < d, and any a > 0, the minimax 
rate of testing on ©o iS is equal to A = (s,d)) 1 / 2 . For any 0 < q < 2, and any r, a > 0, 

the minimax rate of testing on @ qr is equal to A = (V’o(F( r > d)) 1//2 . 

The proof of this theorem consists in establishing the upper bounds (22) and the lower 
bounds (23). We note first that the lower bounds (23) are essentially proved in [4] and [38]. 
However, in those papers they are stated in somewhat different form, so for completeness we 
give a brief proof in Section 7, which is very close to the proofs of the lower bounds (18) 
and (20). The upper bounds (22) are straightforward in view of (17) and (19). Indeed, for 
example, to prove (22) with q = 0 and u = s, we fix some A > 0 and consider the test 


(24) 


A* = 1 


{jv>(A/2)(v>y^(s.d))i/2}- 


Then, writing for brevity if = (s, d) and applying Chebyshev’s inequality, we have 


(25) 


TZ 0tS (Afj) < P 0 (A* = 1) + sup P 0 (A* = 0) 

0EOo,s (Ay/lp) 

<P 0 (N> A^P/2)+ sup P e (N-\\6\\ 2 <-A^f’/2) 
oeBo(s) 

<2 sup 

9eB 0 (s) (A/2) 2 V> 


< cj- 2 


for some absolute constant C* > 0, where the last inequality follows from (17). Choosing A £ 
as a solution of C^Af 2 = e we obtain (22). The case 0 < q < 2 is treated analogously by 
introducing the test 

" 1 {N>(A/2)(^(r,d)) 1 / 2 } 

and using (19) rather than (17) to get the upper bound (22). 

Furthermore, as a simple corollary we obtain a non-asymptotic analog of the Ingster- 
Donoho-Jin theory. Consider the problem of testing the hypothesis Ho : 0 = 0 against the 
alternative Hi : 6 G © s (5) where 

(26) Q s (6) = {6 G R d : ||0|| o = s, % G {0, 6}, j = 1,..., d} 


for some integer s G [1 ,d] and some 5 > 0. [21] and [12] studied a slightly different but 
equivalent problem (with 9j taking values 0 and 5 at random) assuming in addition that 
s = d a for some a G (0,1/2). In an asymptotic setting when <r —>• 0 and d = d a —>• oo, [21] 
obtained the detection boundary in the exact minimax sense, that is the value A = \ a such 
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that asymptotic analogs of ( 22 ) and (23) hold with A e = a £ and e = 0. [ 12 ] proved that 
the detection boundary is attained at the Higher Criticism test. Extensions to the regression 
and classification problems and more references can be found in [22], [24], [3]. Note that the 
alternatives in these papers are defined not exactly in the same way as in (26). 

A natural non-asymptotic analog of these results consists in establishing the minimax rate 
of testing on & s (S) in the sense of the definition ( 22 ) - (23). This is done in the next corollary 
that covers not only © s (<5) but also the following more general class: 

O;(6) = {0£R d : ||d || 0 = s, min |^| >sl. 

I j: Oj^O J 


We define the minimax rate of testing on the classes 0 S and 0* similarly as such rate was 
defined for 0 9jU , by modifying (22) - (23) in an obvious way. 

Corollary 2. Let s and d be integers satisfying 1 < s < d, and let a > 0. The minimax rate 
of testing on 0 S is equal A = <jy/Iog(I + d/s 2 ) for s < Vd. Furthermore, the minimax rate of 
testing on 0* is equal to 


A 


<y y / log(l + d/s 2 ) if s < Vd, 
ad 1 ^/yfs if s > \fd. 


The proof of the upper bound in this corollary is essentially the same as in Theorem 12. 
We take the same test statistic A* and then act as in (25) using that 0 s (AA) and ©*(AA) 
are included in 0 o jS (AAy / s). The proof of the lower bound for the case s < Vd is also the 
same as in Theorem 12 since the measure p p used in the proofs (cf. Section 7) is supported on 
s-sparse vectors 0 with all coefficients taking the same value. For s > Vd we need a slightly 
different lower bound argument - see Section 7 for the details. 

[21] and [12] derived the asymptotic rate of testing in the form A = c{a)aV log d where the 
exact value c(a) > 0 is explicitly given as a function of a appearing in the relation s = d a , 
0 < a < 1/2. Corollary 2 allows us to explore more general behavior of s leading to other 
types of rates. For example, we find that the minimax rate of testing is of the order a if 
s = Vd and it is of the order a^/log log d if s x Vd/flogd) 1 for any 7 > 0 . Such effects 
are not captured by the previous asymptotic results. Note also that the test A* (cf. (24)) 
that achieves the minimax rates in Corollary 2 is very simple - it is a plug-in test based on 
the estimator of the t^-norm. We do not need to invoke refined techniques as the Higher 
Criticism test. However, we do not prove that our method achieves the exact constant c(a) in 
the specific regime considered by [ 21 ] and [ 12 ]. 


7 Proofs of the lower bounds 

7.1 General tools 

Let p be a probability measure on 0. Denote by the mixture probability measure 

P/i = / P eh{d0). 

Jo 

A vector 9 £ R d is called s-sparse if ||0||o = s. For an integer s such that 1 < s < d and p > 0, 
we denote by p, p the uniform distribution on the set of s-sparse vectors in M. d with all nonzero 
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coefficients equal to ap. Let 


X 2 (P',P) = f{dP'/dP) 2 dP- 1 

be the chi-square divergence between two mutually absolutely continuous probability measures 
P' and P. 

The following lemma is obtained by combining arguments from [4] and [9]. 

Lemma 1. For all a > 0, p > 0, 1 < s < d, we have 



For completeness, the proof of this lemma is given in the Appendix. We will also need a 
second lemma, which is a special case of Theorem 2.15 in [37]: 

Lemma 2. Let 0 be a subset of M d containing 0. Assume that there exists a probability 
measure p on 0 and numbers v > 0,/3 > 0 such that T{6) = 2v for all 0 € supp(/j) and 
X 2 (P m ,Po) < p, Then 

inf sup P 0 ( | f - T(6) | > v) > ^exp(-/3), 

T 0S0 4 

where infj, denotes the infimum. over all estim.ators. 

7.2 Proof of the lower bound (4) in Theorem 1 

Set p = \J log(1 + d/s 2 ). Then, by Lemma 1, 

(27) X 2 (P ^,Po)< ( 1 -^ + ^( 1 + ^)) " 1= ( 1 + £) _1 - e “ L 

Next, L(0) = asp for all 9 6 supp(/r p ), and also supp(/.x p ) C Bq(s). Thus, the assumptions of 
Lemma 2 are satisfied with 0 = Bq(s), f3 = e — 1. v = asp/2 = (l/2)ir.SY / log(l + d/s 2 ) and 
T{6) = L(9). An application of Lemma 2 yields 

inf sup Pg (\T - L(0)\ > (l/2)<rsv / log(l + d/s 2 )} > yexp(l - e), 
t eeBo(s) v / 4 

which implies (4). 

7.3 Proof of Theorem 4 

We start by rewriting in a more convenient form the lower rates we need to prove. For this, 
consider separately the cases s > Vd and s < Vd. 

Case s > \fd. The lower rate we need to prove in this case is min{K 4 , max(a 2 K 2 , <r 4 d)}. It 
is easy to check that we can write it as follows: 

( a 2 n 2 if k 4 > <T 4 d 2 , 

(28) min{/« 4 , max(a 2 K 2 , a 4 d)} = < a 4 d if a 4 d < n 4 < a 4 d 2 , 

I k 4 if k 4 < a 4 d. 
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Note that the lower rate a 4 d for a 4 d < n 4 < a 4 d 2 follows from the lower rate k 4 for k 4 < a 4 d 
and the fact that the minimax risk is a non-decreasing function of k. Therefore, to prove 
Theorem 4 for s > y/d, it is enough to show that Rq(B 2 (k) D Bq(s)) > c(lower rate), where 
c > 0 is an absolute constant, and 


(29) 


lower rate 


c t 2 k 2 if k 4 > a 4 d 2 and s = y/d, 
k 4 if k 4 < a 4 d and s = y/d. 


In (29), we assume w.l.o.g. that y/d is an integer and we replace w.l.o.g. the condition s > y/d 
by s = y/d since the minimax risk is a non-decreasing function of s. 

Case s < y/d. The lower rate we need to prove in this case is 

min{/i 4 , max(<7 2 K 2 , a 4 s 2 log 2 (l + d/s 2 ))}. 


The same argument as above shows that the analog of representation (28) holds with d replaced 
by s 2 log 2 (1 + d/s 2 ), and that it is enough to prove the lower rate of the form: 


(30) 


lower rate = 


a 2 n 2 if k 4 > n 4 s 4 log 4 (l + d/s 2 ) and s < y/d , 
k 4 if k 4 < a 4 s 2 log 2 (l + d/s 2 ) and s < y/d. 


Thus, to prove Theorem 4 it remains to establish (29) and (30). This is done in the following 
two propositions. Proposition 1 is used with b = log 2 and it is a more general fact than the 
first lines in (29) and (30) since B- 2 (k) fl Bo(s) D -^(k) fl -Bo(l), and slog(l + d/s 2 ) > log2 
for 1 < s < y/d. Proposition 2 is applied with b = l/(log2). 

Proposition 1. Let b > 0. If n > ba, then 

inf sup Pe (\T — Q(0)\ > ( 3b/8)aK ] > -exp(—6 2 /4), 

T 0es 2 (K)nB o (i) V / 4 ' 

where infj. denotes the infim.um . over all estimators of Q. 

Proposition 2. Let b > 0. If k 4 < b 2 a 4 s 2 log 2 (l + d/s 2 ) and 1 < s < d, then 

inf sup Vq (\T — Q(9)\ > k 2 /(2 nrax(6, l))') >- exp(l — e), 

T 0£B 2 ( K )nB o (s) v ' 4 

where inf^. denotes the infim.um . over all estimators of Q. 


7.4 Proof of Proposition 1 

Consider the vectors 9 = (k, 0,..., 0) and O' = (k — ba/2, 0,... , 0). Clearly, 9 and 9' belong 
to B 2 (k) fl £(,(!)• We have 

d{6,9') 4 | Q(9) - Q{9') | = |ct 2 6 2 /4 - nab\ > 3ctk6/4, 


and the Kullbaek-Leibler divergence between P@ and Psatisfies 

b 2 


I<( P e ,P fl /) = 


\\o-e'\\l 

2n 2 


We now apply Theorem 2.2 and (2.9) in [37] to obtain the result. 
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7.5 Proof of Proposition 2 


Set p = k/(<J\J max( 6 , l)s). Then p 2 < log(l+d/s 2 ) and due to (27) we have x 2 (IVp, Po) < e — 
1. Next, Q(0) = ||$|| 2 = -sa 2 p 2 = n 2 / max( 6 ,1) for all 9 £ supp(^ p ), which implies supp(/r p ) C 
B- 2 (k). We also have supp(/r p ) C Bq(s) by construction. Therefore, the assumptions of 
Lemma 2 are satisfied with 0 = ^(k) H Bq(s), /? = e — 1, v = K 2 /(2nrax( 6 ,1)) and T($) = 
Q(0). An application of Lemma 2 yields the result. 

7.6 Proof of Theorem 2 

In order to prove Theorem 2, we will need the following proposition. 

Proposition 3. Let b > 0. If n 2 < 6 2 rr 2 s 2 log(l + d/s 2 ) and 1 < s < d, then 

inf sup Pgi (\T — L{6)\ > n/(2 max( 6 , l))') > - exp(l — e), 

T eeSi(K)ns 0 (s) v 7 4' 

where inf^. denotes the infim.um, over all estimators. 

Proof. We proceed as in the proof of Proposition 2 with the following modifications. We now 
set p = fv/(max( 6 , 1)<ts). Then x 2 0PVv! Po) < e — 1 and L(9) = ||0||i = sap = k/ max( 6 ,1) for 
all 6 £ supp {p.p), so that supp(^ p ) C 0 = B\(k) fl Bq(s) and Lemma 2 applies with f3 = e — 1, 
v = /«/(2max(6,1)) and T{9) = L(9). □ 

Proof of Theorem 2. First notice that, for an integer s £ [1, d], and 0 < q < 1, k > 0, 

(31) Bi(k) H Bo(s) C B q (r) if s l ~ q n q = r q . 

We will prove the theorem by considering separately the cases m = 0 and m > 1. 

Case rn = 0. Then, r 2 < a 2 log(l + d) and the assumption of Proposition 3 is satisfied 
with s = 1, b = 1, and n = r. Applying Proposition 3 with these parameters and using (31) 
with s = 1 we easily deduce that R* L (B q {r)) > Cr 2 . 

Case m > 1. We now use the embedding (31) with s = m. Then 

(32) k = rm x ~ x ^ q > crmy / log(l + d/m 2 ) 

where the last inequality follows from the definition of m. Furthermore, the fact that m > 1 
and the definition of m imply 

(33) 2 ~ 2 / q r 2 m~ 2 ^ q < r 2 (m + 1 )~ 2 A < a 2 log(l + d/(m + l) 2 ) < a 2 log(l + d/m 2 ). 

This proves that for k defined in (32) we have k 2 < 2 2 / q a 2 m 2 log(l + d/m 2 ). Thus, the 
assumption of Proposition 3 is satisfied with s = m, b = 2 1 / 9 and n defined in (32). Applying 
Proposition 3 with these parameters and using (31) with s = m we deduce that R* L {B q {r)) > 
Cn 2 . This and (32) yield R* L (B q (r)) > Ca 2 m 2 log(l + d/m 2 ), which is the desired lower 
bound. 
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7.7 Proof of Theorem 6 

First notice that, for an integer s € [l,d], and 0 < q < 2, n > 0, 

(34) B 2 (k) n B 0 (s) C B q (r) if s l ~ q / 2 K q = r q . 

Consider separately the cases m = 0, 1 < m < \fd. and m > \fd. 

Case m = 0. Then, r 2 < a 2 log(l + d) so that the assumption of Proposition 2 is satisfied 
with s = 1, b = 1, and k = r. Applying Proposition 2 with these parameters and using (34) 
with s = 1 and k = r we get that Rq(B q {r)) > Cr 4 . 

Case 1 < m < \fd. We start by using (34) with s = rri. Then 

(35) k = rm x l 2 ~ x l q > ay/m log(l + d/m 2 ) 

where the last inequality follows from the definition of m. For this k, using (33) we obtain 
k 2 < 2 2 ! q a 2 mlog(l + d/m 2 ). Thus, the assumption of Proposition 2 is satisfied with s = m, 
b = 2 2 / q and k defined in (35). Applying Proposition 2 with these parameters and using 
(34) with s = m we deduce that f?L(.B 9 (r)) > C/t 4 . This and (35) prove the lower bound 
R*Q(B q (r )) > Ca 4 m 2 log 2 (l + d/m 2 ). 

To show that Rq(B q (r)) > Ca 2 r 2 , we use (34) with s = 1 and k = r. Now, m > 1, which 
implies r 2 > a 2 log(l + d) > a 2 (log 2). Thus, the assumption of Proposition 1 is satisfied with 
s = 1, k = r, and any 0 < b < v / log~2, leading to the bound R*q(B 2 (k) n f?o(l)) > Ca 2 r 2 . 
This inequality and the embedding in (34) with s = 1 yield the result. 

Case m > \fd. It suffices to note that the argument used above in the case 1 < rn < sfd 
remains valid for m > \fd and s = \fd instead of s = m (assuming w.l.o.g. that \fd is an 
integer). 

7.8 Proof of the lower bound (18) in Theorem 8 

Let s < yfd. Set p = y^og^ + d/s 2 ). Due to (27) we have y 2 (P Mp , Po) < e — 1. Next, || 0||2 = 
(Jp\fs = cyslo^l + ci/s 2 ) for all 6 € supp(p p ), and supp(/ip) C Bo(s) by construction. 
Therefore, the assumptions of Lemma 2 are satisfied with 0 = Bq(s), /3 = e — 1, v = 
c t^/s log(l + d/s 2 )/2 and T(6) = || 0 || 2 . An application of Lemma 2 yields the result for 
s < yfd. To obtain the lower bound for s > Vd, it suffices to consider the case s = yfd 
(assuming w.l.o.g. that \fd is an integer) and to repeat the above argument with this value 
of s. 

7.9 Proof of the lower bound (20) in Theorem 9 

If m = 0 we have r 2 < a 2 log(l + d). In this case, set p = r/cr, s = 1. Then, p < ylog(l + d) 
and due to (27) with s = 1 we have x 2 (P^ p ,Po) < 1. Next, ||0||2 = ||#||<j = r for all 6 € 
supp(/Up). Thus, supp(/ip) C B q (r) and the assumptions of Lemma 2 are satisfied with 0 = 
B q (r), (3 = 1, v = r/2 and T(0) = ||0||2, which implies the bound R*^(B q (r)) > Cr 2 for 
m = 0 . 

Case 1 < m < \fd. Use the same construction as in the proof of (18) replacing there s 
with m. Then, || 0||2 = ay/m log(l + d/m 2 ), and \\0\\ q = apm 1 ^ = crm 1 // ' ? y / log(l + d/m 2 ) for 
all 6 € supp (p p ). By definition of m, we have crm 1 //<? y / log(l + d/m 2 ) < r guaranteeing that 
supp (p p ) C B q {r). Other elements of the argument remain as in the proof of (18). 
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Case rri > \fd. Use the same construction as in the proof of (18) with s = \fd (as¬ 
suming w.l.o.g. that yfd is an integer). Then p = \/log 2, || 0||2 = cro? 1 / 4 \/log2, and ||0|| ? = 
ad 1 ^ 2 ^ \/log 2 < r (by definition of m) for all 6 G supp(/i p ). Other elements of the argument 
remain as in the proof of (18). 

7.10 Proof of the lower bounds in Theorem 12 and in Corollary 2 

The following lemma reduces the proof to the argument, which is very close to that of the 
previous two proofs. 

Lemma 3. If p is a probability measure on 0, then 

inf{Po(A = l)+supP 0 (A = O)j > 1- A 2 (P m ,P 0 ) 

A l 0ee J V 

where inf a is the infimum . over all { 0,1 }-valued statistics. 

Proof. For any {0, l}-valued statistic A, 

P 0 (A = l) + supP fl (A = 0) >P 0 (A = 1)+ f P 0 (A = O )p(d9) 
eee Je 

= P 0 (A = 1) + P p (A = 0) > 1 - U(P M ,P 0 ) > 1 - yJxH^Po) 

where V(-, •) denotes the total variation distance and the last two inequalities follow from the 
standard properties of this distance (cf. Theorem 2.2(i) and (2.27) in [37]). □ 

Proof of the lower bound in Theorem 12 for q = 0. We use a slightly modified argument of 
Subsection 7.8. As in Subsection 7.8, it suffices to prove the result in the case s < \fd. Then, 
ifa{s,d) = a 2 s log(l + d/s 2 ), so that our aim is to show that the lower rate of testing on 
Bo(s) is A = ay/s log(l + d/s 2 ). Fix A € (0,1). We use Lemma 3 with 0 = 0o )S (AA) and 
p = p p where we take p = Ay / log(l + d/s 2 ). For all 0 € supp (p p ) we have ||0||2 = ap/s = AX 
while supp (p p ) C Bq(s) by construction. Hence supp(/i p ) C 0 O)S (AA), so that we can apply 
Lemma 3. Next, by Lemma 1, 

(36) X 2 (P, V ,P 0 ) - ^ ^ ^ (l + ^ - 1 < - 1 < exp (A 2 ) - 1 

where we have used that (1 + x ) A2 — 1 < A 2 x for 0 < A < 1, x > 0. The last display and 
Lemma 3 imply that 1Zq^ s {A\) > 1 — y / exp(A 2 ) — 1. Choosing a £ such that y / exp(a?) — 1 = e 
proves (23). 

Proof of the lower bound in Theorem . 12 for 0 < q < 2 follows along similar lines but now we 
modify, in the same spirit, the argument of Subsection 7.9 rather than that of Subsection 7.8. 
The corresponding p in Subsection 7.9 is multiplied by a suitable A G (0,1) and then Lemma 3 
is applied. We omit the details. 

Proof of the lower bound in Corollary 2. As explained after the statement of Corollary 2, 
we need only to consider the case s > \fd for the class 0*. Then, A = adfl^/yfs. Instead 
of p p we consider now a slightly different measure p p , which is the uniform distribution on 
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the set of s-sparse vectors in with nonzero coefficients taking values in {—ap,ap}. Then, 
similarly to Lemma 1, 

(37) X 2 (F/j p ,Po) < (l - 2 + ^cosh(p 2 )) -1, 

cf. formula (27) in [4]. Fix A € (0,1). We now use Lemma 3 with 0 = 0*(HA) and p = p p 
where we take p = Ac^^/yfs. For all 9 £ supp(/j p ) we have \9j\ = op = Aad 1 ^/yfs = AX 
and also supp(p p ) C {||#|| 0 = s} by construction. Hence snpp(p p ) C 0*(HA), so that we can 
apply Lemma 3. Since s > yfd we have p < 1. Using (37) and the fact that cosh(x) < 1 + x 2 
for 0 < x < 1 we obtain 

X 2 (I%,Po) < (f + ^-) - 1 < exp(H 4 ) - 1 
and we conclude the proof in the same way as it is done after (36). 


8 Proofs of the upper bounds 

We will use the following lemma. 

Lemma 4. For X ~ A7(0,1) and any x > 0 we have 
4 


(38) 


+ y/x 2 + 4) 


e~ x /2 < P(|X| > x') < 


y/2ir(x + y/x 2 + 2 ) 


0 -* 2 /2 


(39) 


E 


X 2 t 


{\X\>x} 


< \t-[x + -)e 

TV \ X) 




(40) 


E 


X 4 t 


{\X\>x} 


< \ l — [ x 3 + 3x + — )e x2 / 2 


Inequality (38) is due to [ 6 ] and [36]. Inequalities (39) and (40) follow from integration by 
parts. 

In this section, we will use the notation 


x = \/21og(l + d/s 2 ), S = {j : \yj\ > ax}, S = {j : 0 3 ^ ()}. 

We will denote by C\ , i = 1 , 2 ,..., absolute positive constants, and by C absolute positive 
constants that can vary from line to line. 


8.1 Proof of the bound (3) in Theorem 1 

Clearly, E 6 >(X^j =1 %' ~ 7y(0 )) 2 = a 2 d. Thus, in view of (5), to prove (3) it is enough to show 
that for s < yfd we have 

(41) sup E g (L* — L{6)) 2 < Ca 2 s 2 log(l + d/s 2 ) 

SgBq(s) 
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where 


L* ~ 1 {\y j 

3 =i 

and C > 0 is an absolute constant. We have 


{I Vj I >CT V / 2Tog(l+d/ts 1 )} 


- m = - Oj) ~ yj+ Vj- 

ieS leS\S ieS\S 


Thus, for 8 E L?o(s), we obtain 


E e (L*-L(8)) 2 < 3E +3E, ^j/,1 


i {|%l<o-a:} 


+ 3E (E 

ie5 c 


< 3(J 2 |(s + sV) + ^ e(^| l { |^.| >a .})} 

jeS c 

< 3^{ (a + ^) + ^(x + |)e^} (by (39)) 


< 3cr 2 |(s + s 2 x 2 ) + s 2 y — + — 


and (41) follows since x > i/2 log 2 for s < VtL 


8.2 Proof of Theorem 3 


We will consider only the sparse zone 1 < m < \fd since the cases in = 0 and m > \fd are 
trivial. Fix 8 E B q (r). We will use the notation 


d = l + d/m 2 , x = 2\/2logd, S = {j : \8j\ > ax/2}. 


Note that 


(43) Card(S) < Q|-Y < 2~ q / 2 {m + 1) < 2 1 ~ 9 / 2 m, 

where the first inequality is due to the fact that 8 E B q (r ) and the second follows from the 
definition of m. 

Consider first the bias of L q . Lemma 5 yields 


(E e{L q ) - my < min(|%|,ax)j < \0j 

3 =1 3 = 1 

< C (^Z) 2q a 2 logd 


q (<jx) l ~ q 


< Ca 2 m 2 log d. 

where we have used (43). Next, the variance of L q has the form 


Var e(L q ) = ^Var e {yj >(r£ }). 
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Here, for indices j belonging to S, using (43) we have 


(45) J^Var e ( yj l{| % | >a *}) < 2^ Var e { yj ) + 2^Var g ( yj 1 { | % .|<^ } ) 

jeS jes jeS 

< 2Card(S')cr 2 (l + x 2 ) 

< Ca 2 mlogd. 

For indices j belonging to S c , we have 

(46) ■£ Var o(yj 1 {\yj\>ax}) — E ^-{\yj\>ax}) 

j£S c j&S c 

— ^ ^ a ^21 ^{\yj\><jx}) 

jeS c jes c 

jeS c jeS c 

Using the same argument as in (44) we find 

(47) ( '^2 Ifyl) — C ( min( 1 6j |, ax)) <Ca 2 m 2 logd. 

j£S c i =1 


Finally, (39) implies 

(48) u 2 Y E (£j 1 { | 5j | >v ^J } ) ^ Ca 2 (d/d)\J logJ < Ca 2 m 2 log d 

jeS c 

where for the last inequality we have used that logd > log 2 for m < \fd. Combining (45) - 
(48) we obtain that 

Var $(L q ) < C(j 2 m 2 log d. 

Together with (39), this yields the desired result: 

sup Eo(L q — L{6)) 2 < Ca 2 m 2 log d. 

O&Bq (r) 

8.3 Proof of Theorem 5 

The upper bound k 4 for n 4 < ip a (s, d, k) is trivial since the risk of the zero estimator is equal 
to k 4 . Let now k 4 > ip a (s, d, k). We analyze separately the cases s > yfd, k 4 > ip a (s, d, k), 
and s < yfd, n 4 > ip a (s, d, k). 

Case s > yfd and k 4 > tpais, d, k). Then, Q = Q* and Theorem 5 claims a bound with 
the rate ipa (s, d, k) = ipa(s, d, k) = max(a 2 «: 2 , a 4 d). To prove this bound, note that 

d d 

Q* ~ Q(0) = 2aY 0& + - !)■ 

3 =1 3 =1 
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Thus, for all 9 E B 2 (k) 


E e ( Q * - Q ( 0)) 2 = 4a 2 E(^^-) 2 + a 4 e (^(£ 2 - l ))' 

3 =1 3= 1 

(49) = 4cr 2 1|^||| + 2 a 4 d < 6 max(<j 2 K 2 , a 4 d). 

Case s < \fd and k 4 > ^(s, d, k). Then, Q = Q' where 

d 

Q ^2^3 a(J ) ^{| yj\><Ty/2 log(l+d/s 2 )} 

1=1 

and ifra (s, d, k) = max(< 7 2 K 2 , cr 4 s 2 log 2 (l + d/s 2 )). Here and below in this proof, we set for 
brevity a = a s . 

Since s < \fd, we have x > i/2 log 2. Using Lemma 4, we hnd that, for s < \/d, 


(50) 


E^Ih.ym) 

P(|X| > x) 


< (x + 2/x)(x + 1) < 5x 2 


10 log(l + d/s 2 ). 


Similarly to (42), we get 


q' - q(o) = y 2 - acr 2 - ^ 2 ) - ( y j ~ aa2 ) + Y 2 fe?~~ acr2 )> 

ieS ieS\S ies\s 


and thus 

E e (Q'-Q{9)) 2 <3F, g ^^2(y 2 -aa 2 -9 2 )^) +( ^ (y 2 - acr 2 )) + ( ^(y 2 -aa 2 )) 

jeS\S jeS\S 

For 0 E 1?2 (k) PI Bq(s), the first term on the right-hand side satisfies 

E *( E(y 2 - acj2 - fy) 2 = E ( £( 2 <^i + * 2 ($ - «))) 2 

ieS jeS 

< 4(J 2 ||0||| + 2o 4 s 2 [a 2 + 3) < 4cr 2 ||0||2 + 2er 4 s 2 (25x 4 + 3) 
(51) <Ci (cr 2 ||6'||| + cr 4 s 2 log 2 (l + (i/s 2 )) 

<C\ (ct 2 k 2 + cr 4 s 2 log 2 (1 + d/s 2 )). 


Furthermore, by definition of S, 


E^ ^ (y 2 — an 2 ) j < 4a 4 s 2 log 2 (l + d/s 2 ) + 2cr 4 s 2 a 2 

jes\s 

< C 2 a 4 s 2 log 2 (l + d/s 2 ) 
for any 6 E Bq{s). Finally, a was chosen such that, for any j 0 S, 


Efl 


(y 2 - ao 2 )t { \ y .\ >ax} =ct 2 E {X 2 - a)l { |x|>* } 


= 0 , 
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where X ~ A/"(0,1). Thus, by independence we have 


M (Vj - « cj2 )) = E 0 (y'j - aa 2 yt { \y.\> ax} 

j£S\S j£S 1 

(52) <a 4 dE\(X 2 -a) 2 t {l x\> x} 


< 16cj 4 dE 


X ^{\X\>x} 


< C 3 a 4 dx 3 e~ x2/2 


< C I 4 CJ 4 s 2 x 3 < (C 4 / \J 2 log 2)cr 4 s 2 x 4 < C§a A s 2 log 2 (l + d/s 2 ), 


where we have used that a < 5X 2 on the event {|X| > x}, inequality (40) and the fact that 
x > \/2 log 2. Combining the above displays yields 

sup E e(Q' — Q(0)) 2 < Ce max(cr 2 /t 2 , a 4 s 2 log 2 (l + d/s 2 )). 

eeB 2 (K)nB 0 (s) 


8.4 Proof of Theorem 7 


Fix 9 £ B q (r). We will prove the theorem only for 1 < m < \fd since the case m = 0 is trivial 
and the result for the case m > \fd follows from (49) and the fact that ||#||2 < ||0|| g < r. In this 
proof, we will write for brevity a = a m , d = 1 + d/m 2 , x = 2(2logd) 1 / 2 . Let J C {1,... ,4} 
be the set of indices corresponding to the m largest in absolute value components of 9, and 
let 1 9 1 (j j denote the jth largest absolute value of the components of 9. It is easy to see that 


This implies 




jeJ c 


E E 

j>m+l j>m+l 




1 — 2 /q 


Therefore, since 9 E B q (r) and due to the definition of m, 


(53) 


and 

(54) 


9 2 < r 2 m l 2 / q < cr 2 m log d, 

j&jc 

V j € J c : \9j\ < rm~ 4 ! q < a\J log d < cjx/2. 


We have 


( 55 ) 


Q q -Q(9) = ^2{y 2 - aa 2 - 9 2 } - ^ {y 2 - aa 2 } 

ieJ jeJ\S 

+ {yj~ aa2 }- Y, 6 l 


j&S\J 


jeJ c 
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Consider the first sum on the right hand side of (55). Since Card(J) = m, and a < 40log d 
(which is obtained analogously to (50) recalling that now a = a m instead of a = a s ), the 
same argument as in (51) leads to 

(56) {y 2 - aa 2 - O 2 ]^ < C(a 2 \\6\\l + a 4 m 2 log 2 d). 

j&J 


Next, consider the second sum on the right hand side of (55). By definition of S, 


(57) Eg( ^2 {y 2 - aa 2 } 'j < (^ a 2 (x + a)) < Ca 4 m 2 log 2 d. 

j£j\S iGJ 

Let us now turn to the third sum on the right hand side of (55). The bias-variance decompo¬ 
sition yields 

yj 'Eg f ^ ' (Vj 0.0 ) 11 ij :j >ax | 


Eg( { y2 i - Qcj21 


jes\j 


j£j c 


Here, 


'y ^ Var OiO ) l{|gj|><7£}) ^ , E 0^(l/j OG ) -^{|2/j|><T:r}) 

jeJ c j£J c 


Varg ^(t/j CY(J ) ^{\yj\>(Tx}^ — Eg^(t/j OO ) l{|yj|><ra;}) 

< CEg((0 4 + a% 4 + a 2 a 4 ) l { | 1/J |> a5} ) 

< c 


1 2 


9 4 + a 2 cr 4 + <7 4 E( l{|^.| >5/2 } 


<4 


(by (54)). 


Using now the same argument as in (52) to bound E(£ 4 l{|^| > ^/ 2 }) we obtain 

..2 _ 2 "i 


^ Varg 
j£J c 


Vj - a(jZ ) 1 {| W |>«TX}) < C( X] + (j4m2 log2 

jeJ c 

2 


< e j) + o- 4 ?« 2 log 2 d ). 

j£j c 


Furthermore, by Lemma 6, 


| ^2 Ee ((^' aa ^ 

j£j c j£J c 


Combining the above displays leads to the following bound : 


(58) 


E *( 2 {y]-w 2 ' 


}) ^((E 


jeS\J 

From (55) - (58) we deduce that 


ieJ c 


+ a 4 m 2 log 2 d ). 


Eg (Q q - Q{0)) 2 < C'(^cr 2 ||6»||| + ( e j) + cr4m2l °g 2 d 

j£j c 


The result now follows if we use (53) and note that ||0||2 < ||0|| g < r. 
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8.5 Proof of the upper bound (17) in Theorem 8 

Fix 9 E Bo(s) and set for brevity r = d)J 1 ^ 2 . We will bound the risk E g(N — ||0 || 2 ) 2 

separately for the cases || 0|| 2 < r and || 0|| 2 > r. 

Case ||0 || 2 < t. Using the elementary inequality (a — b ) 2 < 2(a 2 — 6 2 ) + 46 2 , we find 

E 0 (JV - ||0|| 2 ) 2 < 2 E*(max{Q., 0} - Q(0)) + 4 Q(9) < 2 (e e (Q. - Q(0)) 2 ) ^ + 4r 2 . 


Note that Q, = Q if we set n = r in the definition of Q. Furthermore, 9 E Bq(s) and, in the 
case under consideration 9 belongs to B 2 {t). Now, use that for all 9 E B 2 (t) H Bq(s), due to 
Theorem 5, we have 

E 9 (Q.-QW) 2 <Ci Q (Md). 

Using this inequality and the fact that ipa (s, d,r) = (s , d)) 2 , we obtain the desired rate: 

E e (N - \\9\\ 2 ) 2 < C 7 ^(s, d ) + 4r 2 = (C 7 + 4 )^(s, d). 

Case ||0 || 2 > r. Using the elementary inequality V a > 0, b > 0, (a — b) 2 < ( a 2 — b 2 ) 2 /a 2 , 
we find 

F( „-, e (Q.-Q(D)) 2 

E » ( JV - Iioy <-pjj|—■ 

Now, we bound E g(Q, — Q(9)) 2 along the lines of the proof of Theorem 5. In particular, if 
s > y/d we have Q. = Q*, r = crd 1 / 4 and using (49) we obtain 


E e{Q.~Q{9)) 2 2 2 a±d 2 , 2^d ^ „ _ 2 . n 

- M — S4 " FI S + — <c*cSd, 


which is the desired rate. If s < Vd, we have Q, = Q' , r = <j^/ s log(l + d/s 2 ) and using (51) 
and the subsequent bounds in the proof of Theorem 5, we obtain 

E g{Q, — Q(9)) 2 3(Cia 2 \\9\\ 2 + (C\ + C 2 + C^o^s” log 2 (l + d/s 2 )) 

IT'-V iTTTTTs — iT77TT2 


<c a {a 2 + ^hi4L±jZdl ) < CloA log(1 + d/ 5 2 ), 


which is again the desired rate. 


8.6 Proof of the upper bound (19) in Theorem 9 

The case m = 0 is trivial. For m > 1, we use the same method of reduction to the risk of 
estimators of Q as in the proof of (17). The difference is that now we set r = (ppcfii{r, d)) 1//2 , 
we replace s by rn, and we apply Theorem 7 rather than to Theorem 5. In particular, an 
analog of (59) with s = m is obtained using (56). 
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8.7 Proof of Theorem 10 

As in the proof of the bound (3) and with the same notation, we have, for 6 G Bq(s), 

E e (L-L(0)) 2 < 3E t {\vA<™}) + 3E ( t W\tj\>™}) 

jeS jeS jeS c 

< 3|(sa 2 + s 2 E e (d 2 ).x 2 ) + <r 2 E (tj 1 H«d>^})}‘ 

J65 c 


Here, 


E, 


e (^j ^{cr\£j\>cry/2 log(l+d/s 2 )} 


1 


= E, 


■(« 


^{cr|^|>CTY / 21og(l+(i/s 2 )} JL { or>cr }) ^ |>ct\/ 2log(l+<i/s 2 )} 

The first term on the right hand side satisfies 


.It 


+ E< 


■(<? 


1 , 




E 


■fc 2 


e[ ^ Vici><v 2 io g (i+d/s 2 )} {& >h ; jL {i^i>v 2i og( i + <i / s2 )} 

Cs 2 


,lr 


<E, 

< 


■(«? 


1 , 


d 


■viog(l + d/s 2 ) (by (39)). 


For the second term, we use Lemma 7 to get 


E *($ 1 H^lW 2i°g(W 2 ) > 1L ^ <r} ) - \f E V£)V E e(v<<?) < CVdexp(-Vd/C). 

Combining the above displays and using Lemma 7 to bound Eg((f 2 ) we obtain 

E e (L - L{6)) 2 < Ca 2 s 2 log(l + d/s 2 ). 


8.8 Proof of Theorem 11 

Set S = { j : \yj\ > dyJ2 log d}. Similarly as in the proof of Theorem 5, we get 

E 9 (<?- 0 ( fl )) 2 < 3 E e [(^( ! /|-«|)) 2 + ( V y |) 2 + ( V yff . 

i& s jes\s jes\s 

We bound separately the three terms on the right hand side. For 9 G B^in) FI Bq{s), the first 
term on the right-hand side satisfies, due to (51) with a = 0, 

(60) E e (^(y 2 -0 2 )) 2 <C (a 2 \\0\\ 2 + a*s 2 ) <C (<tV + <tV). 

ieS 

Using Lemma 7 we find 

Ee ( ^i) ~ E|9 (^ y i ^{|yjl<^V21ogd}) 

jeS\S ieS 

(61) < s 2 E0(fj 4 )(21ogd) 2 < Ccr^s 2 log 2 d. 
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Finally, we write the third term as follows 

( 62 ) Ee ( y f) = Ee (X] cr2 ^I < 2 ( A i + A 2 ) 

jeS\S j£S 

where 

d 2 

Al — Eg ( ^2 CT^Cj ^{o-|^|>,Tv / 2Togd}l{CT>V2(7}) ) 

3 =1 

A 2 = E e ( Y1 (j2 ^ 1 {*<V2 <r}) • 

3 =1 

Using (40) we obtain 

d 2 

(63) A < a l Eg(^ 1 (I£4>2 N /I5id|) < 2a 4 d 2 E(X 4 l{|x| > 2 v d^d}) 

3 =i 

< C<t 4 (log d ) 3//2 

where X ~ jV(0,1). Next, 

d 2 

A 2 < a^Ee ( J2 $ 1 {a<V2a}) < ^ ™ax E 0 (£ 4 1 { ^<^ } ) • 
i=i 

Using (40) we find 

E «($ %<V2<x}) ^ E efe 4 1 {|^|>2VI^d}) + E «fe 4 1 (IUI<2v / I^d} 1 {d<V2a}) 

< -^(log d) 3/2 + 16(logd) 2 P e (d < \/2a). 
d z 

The last two displays and the bound for Pg(a < \pia) from Lemma 7 yield 

(64) A 2 < Ca 4 (logd) 3/2 . 

Combining (60) - (64) proves the theorem. 

9 Appendix: Auxiliary lemmas 

Proof of Lemma 1. We first follow the lines of the proof of Theorem 7 in [9] and then 
apply a result of [2] in the same spirit as it was done in [4], Let ip a be a density of normal 
distribution with mean 0 and variance a 2 . For I £ S(s,d), let 

d 

9i{yi, ■ • •, yd) = IJ <pAvj ~ fj ) 

3 = 1 

where fj = trplLjg/. The density of is 

9 = ~k H 91 

U) IeS(s,d) 
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and we can write 


X 2 (P Mp ,P 0 ) = 


dP, 


AV 


dP. 


dPn — 1 = 


9_ 

f 


- 1 


where / is a density of n i.i.d. normal random variables with mean 0 and variance a 2 . Now, 

„2 


9 — = — y Y 

f /d \ 2 

U ieS(s,d) res(s,d )' 


/ 


It is easy to see that 

which implies 


9l9r 

f 


= exp(p 2 Card(/ n I')), 


j jr = Eexp (p 2 J) 

where J is a random variable with hypergeometric distribution, 


W - .!) - 


yjrys-j/ 

a 


As shown in [2], J coincides in distribution with the conditional expectation E[Z|£>] where Z 
is a binomial random variable with parameters (s, s/d) and B is a suitable cr-algebra. This 
fact and Jensen’s inequality lead to the following bound implying the lenuna: 


9_ 

f 


< Eexp (p 2 Z) = (1 H —-e 

d d 


S .2 


Lemma 5. Let y ~ Af(a,a 2 ) and T = y with r > 0. Set B(a ) = E(T) — a. Then 

there exists C > 0 such that 

\B(a)\ < CTnin(|a|, ctt). 

Proof. Note that B(a) = E(y 1 n y \ <CT \), so that |i?(a)| < nr. Thus, it remains to show that 
there exists C > 0 such that |-B(a)| < C\a\. Indeed, if |a| > a we have 


\B(a)\ < aE\X\ + \a\ < + l) |a|, 


where X ~ A^(0,1). Finally, if |a| < a inequality |i?(a)| < C\a\ follows from the facts that 
B( 0) = 0 and |S'(a)| < 4 for |a| < a. □ 


Lemma 6. Let y ~ J\f(a,cr 2 ), d > 2, x = 2\/21ogd, and |a| < ax/2. Let a. be such that 


E 


(. X 2 — a) 1{|.y|>x} = 0 where X ~ A/"(0,1). Then there exists C > 0 such that 


E 


(y ) -^{|j/|>cra:} 


< Ca 2 . 
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Proof. Using the definition of a we get 


E 


(y OLO ) ® E (A Ol) (l{|y|>ux} 1{|A'|>5}) 


+ 2acrE 


A 1 


{\y\>ox} 


+ a 2 P(|y| > ax). 


Lemma 5 implies that 


a E 


A 1 


{\y\>ax} 


= | B(a) + aP(|y| < ax)\ < C\a\. 


Therefore, to finish the proof it remains to show the inequality 


(65) 


E 


(A 2 - a) (1 { | W | >(TX } - 1{|X|>*}) 


Using the Taylor expansion we obtain 
P(\y\>ax)-P(\X\>x) = 


y/2n . 


< c (-\ 2 

a / 


e -(v-a/o ) 2 /2 _ e ~v 2 /2 


2y/2n . 


t 


{M>H 


;) 2 [(”-!) 2 -‘ 


^{M>U 

e -( v -ta/a) 2 /2 dy 


dv 


where 0 < t < 1. By the assumption on a, on the set {|u| > x } we have |u|/2 < \v — ta/a\ < 
3|u|/2. Hence, 


(66) a|P(|y| > ax) — P(|A| > x)\ < 


a (a 




^{\v\>x}{$v 2 /A + l)e V ~ /S dv 


<C[-) 

.a J 


a\2 (log df/ 2 ^ /a \ 2 


d 


< C 


a 


where we have used Lemma 4 and the facts that x > 2\J2 log 2, d > 2, and a < 40 log d (which 
is proved analogously to (50)). Similarly, 

1 


E 


^ (l{|y|>o*} - 1 {|X|>£}) 


V2tt . 




(v - a/a) 2 e-( v ~ a /^ ! 2 - v 2 e~ v / 2 


dv, 


and from the Taylor expansion of v 2 e y2 / 2 and Lemma 4 we deduce, as in (66), that 


E 


X i^-{\y\>ax} l{|X|>x}) 


< 


1 f a\ 


2y/2 k \a) 


( ^ ) / ^{M>U 


3u\ 4 / 3v\ 2 


+ 5, yj +2 


2 / 8 dr 


< C 


a\ 2 (logd) 3/2 ^ „(a \ 2 


a 


d 


<C[- 
a J 


(we have used that (u 2 e 1,2 / 2 )" = (t> 4 — 5u 2 + 2)e ^ 2//2 ). Combining the last display with (66) 
we obtain (65) and thus the lemma. □ 

Lemma 7. For any 9 such that ||#||o < Vd we have 

(67) E g (a 2 ) < 9a 2 , E 9 (a 4 ) < Ca 4 , 

and 

(68) P g(a < a) < Cdexp(—Vd/C) 
for some absolute constant C > 0. 
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Proof. Since ||#||o < \fd we have 


rf—II 


E vly 


3 =1 


Denote by F the set of indices i corresponding to the d— ||0||o smallest values yf. Then 

£ «? 


<HI«llo 

E 4 = E 

j=i 


= 2.rf = * 3 £tf+ E ^" a2 

ieF ieS c ieSnF 


ieS c nF c 


where S' = {j : Qj 0}. For any i € S n -F and any j G S c n F c , we have 

Vi <o Cj • 

Furthermore, Card(S n F) = Card(S c fl F c ). Therefore, 

9(j 2 


cr 


yu x—> . o 


ieS c 


This implies (67). We now prove (68). Let G be the set of indices i corresponding to the 
[d — Vd\ smallest y 2 . Here, |_tI denotes the largest integer less than or equal to x. Then we 
have 

E y u) = E ^ a ' 2 E £ >° 2 J2£- 2 ^ a2 £> 

j<d-Vd *SG ieS c r\G i&S c 

where we have used that Card(G c ) < 2 \fd. This implies: 

^2 9cr 2 (-2 l^ 0 " 2 2 

ieS c 


Thus, 


(69) 


Vd 


~Pq(& < V^cr) < P^9a 2 ^ £ 2 — 18\/dcr 2 max £ 2 < 2d<r 

ieS c 

< p ( 9 E - 3d ) + p ( 18 ™^ 2 - ^ 


ieS c 


The first term on the right hand side of (69) satisfies 

P ( 3 E $ - d ) - P { Ud ~ D - ~ 2d / 3 + 

i&S c 

where D = Card(S c ), and Ud is a y 2 random variable with D degrees of freedom. A standard 
bound on the tails of y 2 random variables (see, e.g. [32]) yields 

P (U D — D < —t) < exp(-t 2 /(4D)), V t > 0. 


Thus, for d > 2, we obtain 

P^3 ^ < dj < exp(—(2d/3 — Vd) 2 /(AD)) < exp (—d/C) 

ieS c 
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where C > 0 is an absolute constant. Finally, the second term on the right hand side of (69) 
satisfies 


P [ max > — ] < dexp — 
l ies- “ 18 ) ~ \ 36 


in view of (38). Plugging the last two displays in (69) we obtain (68). 
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